In this paper we present an algorithm to compute risk averse policies inMarkov Decision Processes (MDP) when the total cost criterion is used togetherwith the average value at risk (AVaR) metric. Risk averse policies are neededwhen large deviations from the expected behavior may have detrimental effects,and conventional MDP algorithms usually ignore this aspect. We provideconditions for the structure of the underlying MDP ensuring that approximationsfor the exact problem can be derived and solved efficiently. Our findings arenovel inasmuch as average value at risk has not previously been considered inassociation with the total cost criterion. Our method is demonstrated in arapid deployment scenario, whereby a robot is tasked with the objective ofreaching a target location within a temporal deadline where increased speed isassociated with increased probability of failure. We demonstrate that theproposed algorithm not only produces a risk averse policy reducing theprobability of exceeding the expected temporal deadline, but also provides thestatistical distribution of costs, thus offering a valuable analysis tool.
展开▼